Distributed Query Processing Using Suffix Arrays

نویسندگان

  • Mauricio Marín
  • Gonzalo Navarro
چکیده

Suffix arrays are more efficient than inverted files for solving complex queries in a number of applications related to text databases. Examples arise when dealing with biological or musical data or with texts written in oriental languages, and when searching for phrases, approximate patterns and, in general, regular expressions involving separators. In this paper we propose algorithms for processing in parallel batches of queries upon distributed text databases. We present efficient alternatives for speeding up query processing using distributed realizations of suffix arrays. Empirical results obtained from natural language text on a cluster of PCs show that the proposed algorithms are efficient in practice.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed text search using suffix arrays

Text search is a classical problem in Computer Science, with many data-intensive applications. For this problem, suffix arrays are among the most widely known and used data structures, enabling fast searches for phrases, terms, substrings and regular expressions in large texts. Potential application domains for these operations include large-scale search services, such as Web search engines, wh...

متن کامل

Scalable Parallel Suffix Array Construction

Suffix arrays are a simple and powerful data structure for text processing that can be used for full text indexes, data compression, and many other applications in particular in bioinformatics. We describe the first implementation and experimental evaluation of a scalable parallel algorithm for suffix array construction. The implementation works on distributed memory computers using MPI, Experi...

متن کامل

Range Median of Minima Queries, Super-Cartesian Trees, and Text Indexing

A Range Minimum Query asks for the position of a minimal element between two specified array-indices. We consider a natural extension of this, where our further constraint is that if the minimum in a query interval is not unique, then the query should return an approximation of the median position among all positions that attain this minimum. We present a succinct preprocessing scheme using onl...

متن کامل

Improved Processing of Path Query on RDF Data Using Suffix Array

RDF is a recommended standard to describe additional semantic information to resources on the Semantic Web. Matono et al. proposed an indexing and query processing scheme for path-based RDF query using a suffix array. In this paper, we indicate some points on the previous approach. We propose an improved indexing and query processing scheme to reduce the binary search space and the overhead cau...

متن کامل

An Indexing Scheme for RDF and RDF Schema based on Suffix Arrays

The Semantic Web is a candidate for the next generation of the World Wide Web. It is anticipated that the number of metadata written in RDF (Resource Description Framework) and RDF Schema will increase as the Semantic Web becomes popular. In such a situation, demand for querying metadata described with RDF and RDF Schema will also increase, and therefore effective query retrieval of RDF data is...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003